Decompose algorithm for thresholding degraded historical document images
نویسنده
چکیده
Numerous techniques have previously been proposed for single-stage thresholding of document images to separate the written or printed information from the background. Although these global or local thresholding techniques have proven effective on particular subclasses of documents, none is able to produce consistently good results on the wide range of document image qualities that exist in general or the image qualities encountered in degraded historical documents. A new thresholding structure called the decompose algorithm is proposed and compared against some existing single-stage algorithms. The decompose algorithm uses local feature vectors to analyse and find the best approach to threshold a local area. Instead of employing a single thresholding algorithm, automatic selection of an appropriate algorithm for specific types of subregions of the document is performed. The original image is recursively broken down into subregions using quad-tree decomposition until a suitable thresholding method can be applied to each subregion. The algorithm has been trained using 300 historical images obtained from the Library of Congress and evaluated on 300 ‘difficult’ document images, also extracted from the Library of Congress, in which considerable background noise or variation in contrast and illumination exists. Quantitative analysis of the results by measuring text recall, and qualitative assessment of processed document image quality is reported. The decompose algorithm is demonstrated to be effective at resolving the problem in varying quality historical images.
منابع مشابه
Binarization of Document Image
Documents Image Binarization is performed in the preprocessing stage for document analysis and it aims to segment the foreground text from the document background. A fast and accurate document image binarization technique is important for the ensuing document image processing tasks such as optical character recognition (OCR). Though document image binarization has been studied for many years, t...
متن کاملA Review on Global Binarization Algorithms for Degraded Document Images
Several algorithms have previously been proposed for improving the thresholding of degraded document images. No algorithm can solve all types of problems, but some algorithms are better than others for specific situations. This article reviews global binarization algorithms for improving degraded document images, thus indicating their differences and similarities, and also their advantages and ...
متن کاملRestoration of Degraded Historical Document Image: An Adaptive Multilayer-Information Binarization Technique
Binary image is the essential format for document image processing, and the operation of the subsequent steps depends on the quality of the binarization process. The objective of this research is to propose a new binarization method based on adaptive multilayer-information for restoration of degraded historical document images. This paper focuses on degraded Thai historical document images, whi...
متن کاملEffective Thresholding of Ancient Degraded Manuscript Folio Images
Thresholding is an essential procedure used in image segmentation and binarization applications. In this paper, segmentation methods applied on document images for separating the text from background presents pure binarization and filtering combined with image processing algorithms. This paper describes a contrast based thresholding method for old degraded manuscript images. It is an approach f...
متن کاملMultispectral Image Restoration of Historical Document Images
Culture is preserved through various documents which is a part of the civilization and heritage. Due to extinction and single document copies available for the future generations about the ancient scripts, the archiving of these documents in the digital process is the solution for these problems. In this paper, the aim is to restore the historical document from tears, stains and poor visibility...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000